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Abstract 

We describe fully retroactive dynamic data structures for approximate range report- 
ing and approximate nearest neighbor reporting. We show how to maintain, for any 
positive constant d, a set of n points in M'^ indexed by time such that we can perform 
insertions or deletions at any point in the timeline in O(logn) amortized time. We 
support, for any small constant e > 0, (1 + e)-approximate range reporting queries at 
any point in the timeline in 0(logn + k) time, where k is the output size. We also show 
how to answer (1 + e)-approximate nearest neighbor queries for any point in the past 
or present in O(logn) time. 



1 Introduction 

Spatiotemporal data types are intended to represent objects that have geometric character- 
istics that change over time (e.g., see [T, 28, 34] 



The important feature of such objects is that their critical characteristics, such as when 
they appear and disappear in a data set, exist in a timeline. The representation of such 
objects has a number of important applications, for instance, in video and audio processing, 
geographic information systems, and historical archiving. Moreover, due to data editing or 
cleaning needs, spatiotemporal data sets may need to be updated in a dynamic fashion, 
with changes that are made with respect to the timeline. Some possible examples of such 
dynamic updates include: 

• A video editor may wish to add a 3-second video segment at the 20-second mark in a 
27-second video and have the result be a seamless 30-second video (say, for a television 
commercial) . 

• A credit reporting company might need to remove some historical transactions in a 
consumer's credit report to reflect the fact that these transactions were entered in 
error. 

• Some of the historical trajectories in a collection of GPS traces may need to be changed 
to correct calibration errors that are discovered only after subsequent post-processing. 

Thus, in this paper we are interested in methods for dynamically maintaining geometric 
objects that exist in the context of a timeline. Queries and updates happen in real time, 
but are indexed in terms of the timeline. For instance, one can ask to mark an object to 
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exist for the first time at time index to, that is, to be inserted at time to. Likewise, one 
may ask to mark an object so it is identified as removed as of time index ti, that is, to 
be deleted at time ti. One may also ask to query the data set as of time index t2, say, to 
ask for an approximate nearest neighbor of a point p as of time index t2- These updates 
and queries occur in real time with the intent that the timeline for the data set is made 
immediately consistent after each update and each query is performed correctly with respect 
to the current state of the timeline. 

For example, suppose we have a set of 1-dimensional points, X = {1,4,7,10,13}, so 
that each x £ X is inserted into the timeline at time index t = x and never removed. If 
we then do a nearest-neighbor query for 6 at time index 12, the result would be 7. But if 
we then revise the timeline so that 6 is inserted at time index 6, and we repeat the above 
nearest-neighbor query, then the answer would be 6. Thus, our intent is that each update 
we make to the data set should propagate through the timeline in a consistent fashion, so 
that future queries made with respect to the timeline are answered correctly. Of course, we 
cannot go back in real time to change the answers to queries done in the (real) past with 
respect to the state of the timeline when that query was made. 

In this paper, we are specifically interested in the dynamic maintenance of a set of 
d-dimensional points that appear and disappear from a data set in terms of indices in a 
timeline, for a given fixed constant d > 1. Points should be allowed to have their appearance 
and disappearance times changed, with such changes refiected forward in the timeline. We 
also wish to support time-indexed approximate range reporting and nearest-neighbor queries 
in such data sets. That is, we are interested in the dynamic maintenance of spatiotemporal 
point sets with respect to these types of geometric queries. 



1.1 Related Work 

Approximate Searching. Arya and Mount |6j introduce the approximate nearest neigh- 
bor problem for a set of points, S, such that given a query point g, a point of S will be 
reported whose distance from q is at most a factor of (1 + e) from that of the true nearest 
neighbor of q. Arya et al. 18 show that such queries can be answered in O(logn) time for a 
fixed constant e > 0. Chan 14] shows how to achieve a similar bound. Arya and Mount [T] 
also introduce the approximate range searching problem for a set, S, where a axis-parallel 
rectangle R is given as input and every point in S that is inside by a distance of at least 
e is reported as output and no point of S outside R is reported. Let k be the number of 
points reported. Arya and Mount show that such queries can be answered in 0(logn + k) 
time for fixed constant e > 0. Eppstein et al. [26] describe the skip quadtree structure, 
which supports 0(logn -|- A;)-time approximate range searching as well as 0(logn)-time 
point insertion and deletion. 

Our approach to solving approximate range searching and approximate nearest neighbor 



problems are based on the quadtree structure 42 . In this structure, regions are defined 
by squares in the plane, which are subdivided into four equal-sized squares for any regions 
containing more than a single point. So each internal node in the underlying tree has up to 
four children and regions have optimal aspect ratios. Typically, this structure is organized in 
a compressed fashion [4| |11|12|19] , so that paths in the tree consisting of nodes with only one 
non-empty child are compressed to single edges. This structure is related to the balanced 
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box decomposition (BBD) trees of Arya et al. p^-'S], where regions are defined by hypercubes 
with smaller hypercubes subtracted away, so that the height of the decomposition tree is 
O(logn). Similarly, Duncan et al. |25j define the balanced aspect-ratio (BAR) trees, where 
regions are associated with convex polytopes of bounded aspect ratio, so that the height of 
the decomposition tree is O(logn). 



Computational Geometry with respect to a Timeline. Although we are not famil- 
iar with any previous work on retroactive d-dimensional approximate range searching and 
nearest-neighbor searching, we nevertheless would like to call attention to the fact that in- 
corporating a time dimension to geometric constructions and data structures is well-studied 
in the computational geometry literature. 



Atallah [9] studies several dynamic computational geometry problems, including con- 
vex hull maintenance, for points moving according to fixed trajectories, showing an 
important connection between such problems and Davenport-Schinzel sequences. This 
work has been followed by a large body of subsequent work on additional connections 
between geometry, moving objects, and Davenport-Schinzel sequences. (E.g., see (48|.) 

Subsequently, a number of researchers have studied geometric motion problems in 
the context of kinetic data structures (e.g., see [3|[l0|[32|[33]). In this framework, a 
collection of geometric objects is moving according to a fixed set of known trajectories, 
and changes can only happen in the present. The goal is to maintain a data structure 
that supports geometric queries on this set with respect to the "current" time. As time 
progresses, the data structure needs to be updated, either because internal conditions 
about its state are triggered or because an object changes its trajectory. 



Sarnak and Tarjan |47| and DriscoU et al. 24 introduce the concept of persistent data 



structures, which support time-related operations where updates occur in the present 
and queries can be performed in the past, but updates in the past fork off new timelines 
rather than propogate changes forward in the same timeline. Such structures have 
been used in a number of applications, such as in planar point location, which use 
space-sweeping operations to construct data structures based on a static-to-dynamic- 
to-static framework. 

All of this previous work differs from the approach we are taking in this paper, since in these 
previous approaches objects are not expected to be retroactively changed "in the past." 

Demaine et al. [20] introduce the concept of retroactive data structures, which is the 
framework we follow in this paper. In this approach, a set of data is maintained with respect 
to a timeline. Insertions and deletions are defined with respect to this timeline, so that each 
insertion has a time parameter, t, and so does each deletion. Likewise, queries are performed 
with respect to the time parameter as well. The difference between this framework and the 
dynamic computational geometry approaches mentioned above, however, is that updates 
can be done retroactively "in the past," with the changes necessarily being propagated 
forward. If queries are only allowed in the current state (i.e., with the highest current time 
parameter), then the data structure is said to be partially retroactive. If queries can be 
done at any point in the timeline, then the structure is said to be fully retroactive. Demaine 
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et al. [20] describe a number of results in this framework, including a fully-retroactive 1- 
dimensional structure for successor queries with 0(log^ n)-time performance. They also 
show that any data structure for a decomposable search problem can be converted into a 
fully retroactive structure at a cost of increasing its space and time by a logarithmic factor. 

Acar et al. [2] introduce an alternate model of retroactivity, which they call non-oblivious 
retroactivity. In this model, one maintains the historical sequence of queries as well as 
insertions and deletions. When an update is made in the past, the change is not necessarily 
propagated all the way forward to the present. Instead, a non-oblivious data structure 
returns the first operation in the timeline that has become inconsistent, that is an operation 
whose return value has changed because of the retroactive update. As mentioned above, we 
only consider the original model of retroactivity as defined by Demaine et al. |2d] in this 
paper. 



Blelloch 13 and Giora and Kaplan |31| consider the problem of maintaining a fully 



retroactive dictionary that supports successor or predecessor queries. They both base their 
data structures on a structure by Mortensen [39], which answers fully retroactive one di- 
mensional range reporting queries, although Mortensen framed the problem in terms of two 
dimensional orthogonal line segment intersection reporting. In this application, the x-axis 
is viewed as a timeline for a retroactive data structure for 1-dimensional points. The in- 
sertion of a segment [(xi,y), {x2,y)] corresponds to the addition of an insert of y at time 
xi and a deletion of y at time X2. Likewise, the removal of such a segment corresponds to 
the removal of these two operations from the timeline. For this 1-dimensional retroactive 
data structuring problem, Blelloch and Giora and Kaplan give data structures that support 
queries and updates in O(logn) time. Dickerson et al. ^22j describe a retroactive data struc- 
ture for maintaining the lower envelope of a set of parabolic arcs and give an application 
of this structure to the problem of cloning a Voronoi diagram from a server that answers 
nearest-neighbor queries. 

1.2 Our Results 

In this paper, we describe fully retroactive dynamic data structures for approximate range 
reporting and approximate nearest neighbor searching. We show how to maintain, for any 
positive constant, d > 1, a set of n points in M*^ indexed by time such that we can perform 
insertions or deletions at any point in the timeline in O(logn) amortized time. We support, 
for any small constant e > 0, (1 + e)-approximate range reporting queries at any point in 
the timeline in 0(logn + k) time, where k is the output size. Note that in this paper we 
consider circular ranges defined by a query point q and radius r. We also show how to 
answer (1 + e)-approximate nearest neighbor queries for any point in the past or present in 
O(logn) time. Our model of computation is the real RAM, as is common in computational 
geometry algorithms (e.g., see j44] ) . 

The main technique that allows us to achieve these results is a novel, multidimensional 
version of fractional cascading, which may be of independent interest. Recall that in the 



(1-dimensional) fractional cascading paradigm of Chazelle and Guibas 17,18], one searches 
a collection of sorted lists (of what are essentially numbers), which are called catalogs, that 
are stored along nodes in a search path of a catalog graph, G, for the same element, x. 
In multidimensional fractional cascading, one instead searches a collection of finite subsets 
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of M*^ for the same point, p, afong nodes in a search path of a catafog graph, G. In our 
case, rather than have each catafog represented as a one-dimensional sorted list, we instead 
represent each catalog as a multidimensional "sorted list," with points ordered as they 
would be visited in a space-filling curve (which is strongly related to how the points would 
be organized in a quadtree, e.g., see 12,46]). 

By then sampling in a fashion inspired by one-dimensional fractional cascading, we 
sho'wj^ how to efficiently perform repeated searching of multidimensional catalogs stored at 
the nodes of a search path in a suitable catalog graph, such as a segment tree (e.g., see |44|), 
with each of the searches involving the same d-dimensional point or region. 

Although it is well known that space-filling curves can be applied to the problem of 
approximate nearest neighbor searching, we are not aware of any extension of space-filling 
curves to approximate range reporting. Furthermore, we believe that we are the first to 
leverage space-filling curves in order to extend dynamic fractional cascading into a multi- 
dimensional problem. 



2 A General Approach to Retroactivity 

Recall that a query Q is decomposable if there is a binary operator □ (computable in 
constant time) such that Q{ALI B) = ^{Q{A),Q{B)). Demaine et al. [20] showed that we 
can make any decomposable search problem retroactive by maintaining each element ever 
inserted in the structure as a line segment along a time dimension between the element's 
insertion and deletion times. Thus each point p in R"^ is now represented by a line segment 
parallel to the time-axis in R'^'^^ dimensions. For example when extending the query to be 
fully-retroactive, a one-dimensional successor query becomes a two-dimensional vertical ray 
shooting query, and a one-dimensional range reporting query becomes a two-dimensional 
orthogonal segment intersection query (Figure [T|. 

Thus, we maintain a segment tree to allow searching over the segments in the time di- 
mension, and augment each node of the segment tree with a secondary structure supporting 
our original query in d dimensions. Let S be the set of nodes in the segment tree on a root- 
to-leaf path searching down for t in the time dimension. To answer a fully-retroactive query, 
we perform the same d-dimensional query at each node in S. This transformation costs an 
extra logn factor in space, query time, and update time, which we would nevertheless like 
to avoid. 



Recall that Mortensen 39 and Giora and Kaplan 31 both solve the fully-retroactive 
versions of decomposable search problems, and are both able to avoid the extra logn factor 
in query and update time. Therefore inspired by their techniques, we propose the following 
as a general strategy for optimally solving the fully-retroactive version of any decomposable 
search problem. 

1. Suppose we have an optimal data structure D for the non-retroactive problem which 
supports queries in polylogarithmic time T{n). 

2. Represent each d-dimensional point as a line segment in d + 1 dimensions. 



^The details for our constructions are admittedly intricate, so some details of proofs are given in appen- 
dices. 
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Figure 1: Examples of how a d dimensional query on points becomes a d + 1 dimensional 
query on segments. Here we illustrate how a one-dimensional successor query becomes a 
two dimensional vertical ray shooting query, and a one- dimensional range reporting query 
becomes a two-dimensional segment intersection query. 
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3. Build a weight-balanced segment tree with poly logarithmic branching factor over the 
segments as described by 31 . 



4. Augment the root of the segment tree with an optimal search structure D. 

5. Augment each node of the segment tree with a colored dynamic fractional cascading 
(CDFC) data structure. 

6. Perform a retroactive query at time t by performing the non-retroactive query on the 
non-retroactive data structure at the root of the segment tree, and for each node on the 
search path for t in the segment tree, perform the query in each of the CDFC structures 
(Figure [2]). 

The CDFC data structure must be cleverly tuned to support a colored (but non- 
retroactive) version of the d-dimensional query in 0(r(n) - log log n/ log n) time. In a colored 
query, each element in the structure is given a color, and the query also specifies a set of 
colors. We only return elements whose color is contained in the query set. The colors are 
required because the segment tree has high degree. Each color represents a pair of chil- 
dren of the current node in the segment tree. Thus we encode which segments overlap the 
search path via their colors. Since the segment tree has a polylogarithmic branching factor, 
we spend T{n) time searching at the root and 0(T{n) ■ log log n/ log n) time searching in 
the CDFC structures at each of the 0(loglogn/logn) nodes. Therefore, the total time 
required by a query is an optimal 0{T{n)). Updates follow a similar strategy, but may 
require us to periodically rebuild sections of the segment tree. We can still achieve the 
desired (amortized) update time, and the analysis closely follows [31|. 




Figure 2: An application of our strategy to a fully-retroactive one-dimensional range re- 
porting query. Shaded elements represent elements with colors indicating they are present 
at time t. 

One of the key difficulties in applying this strategy lies in the design of the colored 
dynamic fractional cascading data structure, especially in problems where the dimension 
d > 1. In fact, the authors are not aware of any previous application of dynamic fractional 
cascading techniques to any multidimensional search problem. However, in the following 
we show how techniques using space filling curves can be applied to extend the savings of 
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fractional cascading into a multidimensional domain. First, we apply the above strategy in 
the simpler case when d = 1. Then we extend this result using space filling curves to support 
Fully- Retroactive range reporting queries and approximate nearest neighbor queries in M*^. 
Note that in one dimension a nearest neighbor query reduces to a successor and predecessor 
query. 

Lemma 2.1. There exists a colored dynamic fractional cascading data structure which 
supports updates in 0(loglog A^) amortized time, colored successor and predecessor queries 
in O(loglogA^) worst case time and colored range reporting queries in 0(loglogA^ + k) 
worst case time, where N is the number of elements stored and k is the number of elements 
reported. 

Proof. We extend the generalized union-split-find structure of [31j to also support colored 
range queries. See Appendix [B| □ 



Space Filling Curves. The z-order, due to Morton [40], is commonly used to map 
multidimensional points down to one dimension. By now space filling curves are well- 



studied and multiple authors have applied them specifically to quadtrees (e.g., see 12 



[46] ) and approximate nearest neighbor queries 14, 16, 21 , 37|. However, we extend their 
application to approximate range searching as well. Furthermore, we believe that we are 
the first to leverage space-filling curves to extend dynamic fractional cascading techniques 
to multidimensional problems such as these. 

Lemma 2.2. The set of points in any quadtree cell rooted at [0, 1)^ form a contiguous 
interval in the z-order. 

Proof. Due to Bern et al. [l2]. See Appendix [P] for details. □ 

Lemma 2.3. Let P be a set of points in M"'. Define a constant c = -v/d(4d + 4) + l. Suppose 
that we have d+l lists P + v^ ,j = 0, . . . ,d, each one sorted according to its z-order. We can 
find a query point q's c-approximate nearest neighbor in P by examining the the 2{d + 1) 
predecessors and successors of q in the lists. 

Proof. See Appendix [P) □ 



3 Main Results 

In this section we give our primary results: data structures for fully-retroactive approximate 
range queries and fully-retroactive approximate nearest neighbor (ANN) queries. Recall that 
an approximate range query report{q, r, e, t) defines an inner range Q~ , the region within a 
radius r of the query point q and an outer range Q^, the area within a radius of (1 + e)r 
of q. We want to return all the points inside Q~ and exclude all points outside for a 
particular point in time t. Points that fall between Q~ and at time t may or may not 
be reported. Points not present at time t will never be reported. 

Theorem 3.1 (Fully-Retroactive Approximate Range Queries). For any positive constant 
d> 1, we can maintain a set of n points in M'^ indexed by time such that we can perform 
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insertions or deletions at any point in the timeline in O(logn) amortized time. We support 
for any small constant e > 0, (1 + e)- approximate range reporting queries at any point in 
the timeline in 0{logn + k) time, where k is the output size. The space required by our data 
structure is 0(nlogn/loglogn). 

Proof. We follow the general strategy outlined in Section [2j We augment the root of the 
segment tree with a skip-quadtree, an optimal structure for approximate range and nearest 
neighbor queries in M*^. We also augment each node of the segment tree with a specialized 
CDFC structure which we now describe. 



We extend the CDFC structure from Lemma 2.1 to maintain d-dimensional points such 



that given a query set of colors Cq and d-dimensional quadtree cell, it returns all points 



contained in that cell with colors in Cq. By Lemma 2.2, for all points y, q, z such that y < 
q < z in the z-order, any quadtree cell containing y and z must also contain q. Furthermore, 
for a given quadtree, each cell is uniquely defined by the leftmost and rightmost leaves in 
the cell's subtree. Therefore, the d-dimensional cell query reduces to a one-dimensional 
range query in the z-order on the unique points which define the quadtree cell (Figure |4]) . 
Thus, by leveraging Lemma |2.1| and maintaining the points according to their z-order, we 
support the required query in 0(loglogn + k) time. 

The skip-quadtree tree contains all the points ever inserted into our data structure, 
irrespective of the time that they were inserted or deleted. The inner and outer range of a 
query partition the set of quadtree cells into the three subsets: the inner cells, which are 
completely contained in Q~^, the outer cells, which are completely disjoint from Q~ , and 



the stabbing cells, which partially overlap and . Eppstein et al. 27 showed that a 
skip-quadtree can perform an approximate range query in 0(logn + e^^^ time expected 
and with high probability, or worst case time for a deterministic skip-quadtree. Using their 
algorithm we can find the set / of ©(e^"*^) disjoint inner cells in the same time bound. Based 
on the correctness of their algorithm we know that the set of points we need to report are 
covered by these cells. We report the points present at time t as follows: For each cell i & I, 
if i is a leaf, we report only the points in i, which are present at time t in constant time. 
Otherwise, we find the first and last leaves yo and zq according to an in-order traversal 
of i's subtree in O(logn) time. Then we perform a fully-retroactive range query on cell i 
in the segment tree, using zq and yo to guide the query on cell i in the CDFC structures. 
Correctness follows since a point satisfies the retroactive range query if and only if it is in 
one of the cells we examine and is present at time t. 

For each of the 0(e^~°') cells in / we spend 0(logn + ki) time and so the total running 
time is 0(e^~'^logn + k) = 0(logn + k), since e is a small constant. The skip-quadtree is 
linear space, and thus the space usage is dominated by the segment tree. The total space 
required is 0(n log n/ log log n). □ 

Corollary 3.2 (Fully-Retroactive ANN Queries.). We can maintain, for any positive con- 
stant, d> 1, a set of n points in indexed by time such that we can perform insertions or 
deletions at any point in the timeline in O(logn) amortized time. We support, for any small 
constant e > 0, (1 + e)- approximate nearest neighbor queries for any point in the past or 
present in O(logn) time. The space required by our data structure is 0(n log n/ log log n). 



Proof. By combining the data structure of j31j with Lemma 2.3 and storing the d-dimensional 



points in that structure according to their z-order, we already have a data structure for fully- 
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retroactive c-approximate nearest neighbor queries. However, c is polynomial function of 
d, and for d = 2, c is already greater than 15. In order to support (1 + e) nearest neighbor 
queries for any e > 0, we require the data structure of the previous theorem. 

We know from Lemma 2.3 that we can find a c-approximate nearest neighbor by using 
d + 1 different shifts of the points. Therefore, we augment our data structure so that 
instead of a single CDFC structure at each segment tree node, we keep an array of d + 1 
CDFC structures corresponding to the z-order of each of the d + 1 sets of shifted points. 
Given a point q, we can find a the predecessor and successor of q at time t in each of 
the d + 1 z-orders in O(dlogn) time. Out of these 2{d + 1) points, let p be the point 
that is closest to q. By Lemma 2.3, p is a c-approximate nearest neighbor. Let r be 
the distance between p and q. As observed by multiple authors [5,35,41j, we can find a 
(1 -|- e)-approximate nearest neighbor via a bisecting search over the interval [r/c, r]. This 
search requires 0(log(l/e)) fully-retroactive spherical emptiness queries. We can support 
a retroactive spherical emptiness query in O(logn) time with only a slight modification to 
our retroactive approximate range query. Instead of returning k points in the range, we 
just return the first point we find, or null if we find none. Thus the total time required 
is 0{dlogn + log(l/e) logn) = O(logn) since we assume e and d are constant. The space 
usage only increases by a factor of d when we store the d + 1 shifted lists, and thus the 
space required is still 0(n logn/ log logn). 

□ 
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Appendices 



A Generalized Van Emde Boas (GVEB) Trees 

Theorem A.l. A GVEB structure with parameters {N,M) uses 0{NC) space and can he 
initialized in 0{NC) worst case time, where C = log* M is the number of colors supported. 
It requires a lookup table of size 0{M) and supports insert, delete and colored successor 
(find) queries in O(loglogA^) worst case time. It also supports colored range reporting 
queries (report) in 0(loglogA^ + k) time, where k is the number of elements reported. 

The GeneraUzed van Emde Boas tree (GVEB) is an extension of the data structure by 



van Emde Boas et al. 49 . Recall that a van Emde Boas tree (VEB) is a recursive data 
structure that supports successor and predecessor queries for a fixed (monochrome) universe 
U in 0(log log U) time. A recursive data structure is one that references itself over a smaller 
section of input as a part of its definition. For example, in a binary search tree, an internal 
node V with key k partitions the set of input keys into keys greater than k that are stored 
recursively in the right subtree and keys less than k that are stored recursively in the left 
subtree. Thus the children of each internal node are themselves binary search trees defined 
over a subset of the input. 

To avoid tedious details, we make the typical assumption that U is the set of integers 
in the range [A^] and that has the form 2 for some positive integer ^. Note that 
£ = O(loglogA^) is the number of levels of recursive structures. Each node of the VEB 
tree contains the following four fields: Min, Max, Bottom, and Top. The Min and Max fields 
contain the minimum and maximum elements in the node's subtree. The VEB partitions 
[A'^] into y/N buckets each of size ^/N . These buckets are stored in Top, a single recursive 
structure. Each bucket in Top points to a recursive VEB structure for a universe size of ^/~N 
in Bottom. The recursive structures in Bottom can be thought of as the children of the node, 
and the single structure in Top functions as an efficient way to search these children. If a 
key k is the maximum or minimum integer in a particular recursive structure, then it is just 
stored in the Max or Min fields. Otherwise, it is stored recursively in Bottom[A; div \/iV] as 
k mod \/iV. 

Let N,M be two integers that fit in a machine word, and let C = [log* M]. The 
Generalized van Emde Boas (GVEB) data structure of [31] maintains a set of (key, color) 
pairs {k, c) where k is an integer in [N] and c is an integer in C. Let Cg C C be a set of 
colors. The query f ind(/cq, Cg) returns the successor of kg with color c G Cq, i.e. we want 
to return the minimum key element {k', c') such that k' > kg and c' G Cg. GVEB supports 
find, insert and delete operations in O(loglogiV) worst case time. 

The GVEB tree is similar to the VEB tree, only instead of maintaining a single minimum 
and maximum integer in each recursive structure, we maintain a min and max for each color 
c S C Let G be a GVEB structure. The Min and Max arrays are actually maintained via 
two arrays each (4 total) of size \C\: max-key, max-rank, min-key, min-rank. For each 
c G C, max-key [c] = max{A: \{k,c) G G} and max-rank[c] is the rank of max-key[c]. The 
arrays min-key and min-rank are defined symmetrically. Max and Min also each contain 
a q-heap, max-Q and min-Q, which contain all the elements in Max and Min respectively. 



Recall that the q-heap of 30 can store up to log* M = \C\ elements, requires a lookup 
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table of size M, and supports successor and rank queries in constant time. 

Finally, we precompute a table of size 0{M) that points to the maximum element in G 
with a color in Cq for every possible set Cq and for each possible max-rank array. Together 
these data structures allow us to spend only constant time at each level of the structure 
during a find, insert, or delete operation, so that the time required for each operation is 



O(loglogn). For full details of how these operations work, see section 5.1 of 31 . 

In addition to the insert, delete, and find operations, we also require a one-dimensional 
colored range reporting query. Therefore we extend the GVEB G to also support the query 
report(i, j, Cq), which returns the set of elements {(/c,c) G G\i < k < j,c £ Cq}. We also 
need an internal helper query reportany(i, j, Cq), which, for each color c € Cq such that the 
set {{k,c)\{k,c) £ G,i < k < j} is non-empty, returns exactly one element from that set. 
For each color c such that there exists an element {k, c) G G, we maintain pointers to the 
successor and predecessor of k with color c. We can maintain these pointers in G with no 
additional asymptotic cost. Then to answer a report query, we first perform a reportany 
query, and then follow these pointers to find the rest of the elements in the range. We now 
describe how to perform the reportany query efficiently. 

We will require the data structure from Lemma 3.2 of |39|. 

Lemma A.l. We can maintain an array A indexed by C where each entry in A is an 
integer in [0{N)] such that updates to A can be performed in constant time and such that 
given i and j we can calculate the set report{A,i, j) = {c G C\i < A[c] < j} in constant 
time. The space usage of A is 0{\C\) and A can be initialized in constant time. 

Proof. The structure uses a q-heap and a global lookup table of size 0{N) to maintain in 
a single word a table B indexed by elements in C such that B[C] is the rank of A[c] among 
the elements of A. See [39| for comparable details. This structure is similar to the max-rank 
data structure of the GVEB structure above, but instead of returning the rank of a single 
element, it returns the colors of the elements between two ranks as a single word. □ 



Let min' and max' each be data structures of the type in Lemma A.l We add these 
structures to each recursive GVEB structure G to answer the query reportany(G, j, Cq). 
The following algorithm follows how Mortensen answered the reportany query in his Sn 
data structure [39], but differs in some important details. If C = or i > j, then there 
are no elements to report. Otherwise, we find elements to report from the min-key array 
as follows. Let Cr = report(i, j, min'), and let Cmin = CqCiCr. We repeat this process 
to find elements in the max-key array. Let Cq = Cq\Cr, the subset of query colors that we 
have not already reported, and let Cr = report(i,j, max'), and Cmax = CqCiCr. We report 
the elements min-key [c] for each c G Cmin and max-key [c] for each c G Cmax- Thus we have 
found all colors for which the minimum or maximum element of that color was in the range 
[i,j], but there could be other colors for which there exists an element in the range [i,j] but 
the minimum and maximum elements of that color were both outside the range. Therefore, 
we recursively search for elements with colors in Cq = Cq\{Cmin U Cmax) via the following 
procedure. If i = or j = — 1, we return because there can be no additional elements to 
report-the max and min elements could not have both been outside the range. Otherwise, 
if i div ^/N = j div ^/N, then the range falls entirely inside a single bucket, and we can 
find the remaining elements with a single recursive call: reportany(C.Bottom[i div i/iV],* 
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mod ^/N,j mod \/iV, Cq). Otherwise, the range spans multiple buckets, and we require 3 
recursive calls: 



1. reportany(G.Bottom[i div VN],i mod \/iV, - 1, Cq), 

2. reportany(G.Top, i div \/iV + l,j div y/N - l,Cg), 

3. reportany(G.Bottom[j div ^/N], 0,j mod ^/N, Cq), passing the Cq parameter by ref- 



erence. 



The first recursive call (1) covers the last part of the first bucket, and the third recursive call 
(3) covers the first part of the last bucket. In both (1) and (3), we encounter the stopping 
condition: i = Ooij = N — l, and so we only need to examine min-key, max- key, min' and 
max' . It is only the second recursive call (2) that may require additional recursion. Note 
that some of the elements returned by (2) not be actual elements, but rather pointers to 
recursive structures. So, as a final step we iterate over all the elements returned, and while 
there exists an element of color c that is actually a pointer to some recursive structure T, 
we replace it with T.min-key[c]. 

We now analyze the additional time and space cost of supporting the report query. 



First we note that the data structure of Lemma A.l requires no more space than the Min 



and Max data structures. Therefore, the asymptotic space usage of our augmented GVEB is 



the same as the original GVEB, which was shown by 31 to be 0{N ■ \C\). Second, observe 



that the data structure of Lemma A.l can be updated in constant time, and only needs to 
be updated when we are already making changes on a given level of recursion. Therefore 
updates take O(loglogA^), no more time than in the original GVEB. Finally, we consider 
the time required to perform a report query. 

The first observation is that whenever we perform more than one recursive call in the 
reportany query, only one call requires non-constant work. It's possible that in the replace- 
ment step, r.min-key[c] may point to another substructure T' , and we may need to follow 
O(loglogA^) pointers to finally reach a true element. However, the second observation is 
that for each substructure T that we examine, there are at least two elements that fall in 
the range and which we will ultimately report. Therefore we can answer the report query 
in time O (log log n + k). 
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B Generalized Union-Split-Find (GUSF) 



The generalized Union-Split-Find (GUSF) structure is an extension of the dynamic Union- 
Split-Find (USF) data structure [23y45| . The USF structure maintains a list L of elements 
from an ordered set, some of which are marked. It supports the following operations: 

• FindNext(x) returns the smallest element x' > x such that x' is marked. 

• Add(x, y) inserts a new unmarked element x just before the element y. 

• Remove(x) removes an unmarked element x from the list. 

• Uniiiark(x) (Union) changes x from a marked element to an unmarked element. 

• Mark(a;) (Split) changes x from an unmarked element to a marked element. 

The USF structure can be considered monochrome. Each element is either unmarked 
(white) or marked with a single color (black). The GUSF extends the USF to support 
a whole set of colors C so that each element can either be unmarked, or marked with some 
subset of colors C C C. The query and update operations are extended accordingly. The 
GUSF also supports additional queries: a FindPrev query, which is symmetric to FindNext 
and a Report (yi, 1/2 ) Cq) query, which reports each element y in the list such that yi < y < y2 
once for each color in C{y) n Cg, where Cq is a set of query colors and C{y) is the set of 
colors with which element y is marked. 

Theorem B.l. A GUSF data structure with parameter N uses 0{N) space and supports 
updates in O(loglogA^) amortized time, find queries in O(loglogA^) worst case time and 
report queries in 0(log log N + k) worst case time. The find queries include FindNext{x, Cq) 
and FindPrev{x, Cq) query, which respectively find the successor and predecessor of x with 
color c £ Cq. The query Report{yi,y2,Cq) reports each element y such that yi < y < y2 
once for each color in C{y)r]Cq. Supported updates include Mark{x, c), Unmark(x, c), Add{x) 
and Remove(x). 



Proof. We employ a technique utilized in the Union-Split-Find data structure 23 , 45 



and later adopted by [39] in his data structure for a colored linked list and 31 in their 
Generalized Union-Split-Find data structure. This technique simultaneously removes the 
requirement that the elements have integer keys and reduces the space required by our data 
structure. Let L be our list and let n = |L| < be the number of elements in the list. The 
key idea is to group consecutive elements of the linked list into blocks of size G(log'' n) for 
an appropriate small constant /? > 3. We label the blocks with integer labels such that the 
order of the labels indicates the order of the blocks in L. We maintain these labels using 
the order maintenance structure of Willard fSO] . 

In each block b we build a binary search tree Ti, and store the elements of b in the leaves 
of Tb, ordered by their order in L. We augment the nodes of Tf, according to the techniques 
of [39] and |20| . Each element y G L is associated with a set of colors C{y), where |C(y)| = 
0(1). Thus, for each leaf Vy in Tf, representing an element y G L, we let C{vy) = C{y). For 
each internal node v in Tf, with children u and w, we let C{v) = C{u) U C{w). Hence we 
color each internal node with the union of the colors of its descendant leaves, and we color 



17 



the root of T;, with the union of the colors of ah the elements in block b. We say a block 
has color c if it contains a leaf with color c. Thus the colors of the block b equal the colors 
of the root ri,. For each block we maintain a single array &.allleafs of size |6| = O(log'^n) 
such that for each leaf Vy G b, there is a pointer in 6.allleaf s to Vy. We do not necessarily 
maintain the order of ft.allleaf s. to match the order of leaves in 6, but we do maintain 
a list 6.freelist of unused indices in b.allleaf s. We also keep an array u.leaf s indexed 
by C in each internal node v such that u.leaf s[c] ^ iS v has a descendant leaf with color 
c, and 6.allleaf s[t>.leaf s[c]] is a pointer to one such leaf. Each of the indices in u.leaf s 
is O(loglogn) bits and therefore the entire array f. leafs can be stored in a single word. 
Furthermore, C{v) can be determined in constant time by examining v. leafs. Lastly, for 
each leaf Vy, and color c S C{vy), we keep a pointer to predecessor and successor of Vy with 
color c, thus creating a linked list over the leaves for each color c G C(rfe). 

We represent each block b by the root ri, of the tree T^. We keep the roots in a linked 
list B. Since each block has size G(log'^n), \B\ = 0(n/log^n). We assign the roots 
integer labels according to their order using the order maintenance structure of Willard [50^ , 
denoted OM. We store the roots of the blocks in a GVEB structure (see Appendix |A]) G 
with parameters {N,AI) according to their integer labels, where N = Q{\B\). Given a 
root with integer label k and colors C{rb), we store one element (/c,c) in G for each color 
c G C{ri,). OM requires linear space. Insertions and deletions to OM can be performed in 
0(log^ \B\) worst case time, and may cause 0(log^ \B\) integer labels to be updated. 

• FindNext(x, Cg). This algorithm resembles that of jsl]. However, instead of having 
a bit-vector C{v) for each internal node v, C{v) is represented implicitly by and 
calculated in constant time from the array f .leaf s. We describe the algorithm here 
for completeness. If C{x) n Cg 7^ 0, then return x. Otherwise x is contained in some 
block b, and therefore in a tree Tb. We traverse the path from x to rh, and look for 
the first node y hanging to the right of this path such that Cq H C(y) 7^ 0. If such a 
node y exists, then we find the leftmost leaf in y's subtree with color c £ Cg as follows. 
Let u be a the current node we're examining. If u is a leaf, then return it. We repeat 
the following until we reach a leaf. Let ve and Vr be the left and right children of v 
respectively. If Cg Ci C{ve) 7^ 0, then set v = V£. Otherwise set v = Vr- Clearly if a 
there exists a successor of x in with color c & Cg we will find it. If not, then we 
find the block 6' with a color in Gg with a query find query on G. If V exists then we 
repeat the above process to find the leftmost leaf with a color in Gg in T^. If not, then 
there is no successor of x with color c£ Cg. It takes O (log log n) time to perform the 
query in each block, and 0(loglog A^) = O(loglogn) time to perform the query in G. 
Therefore the total time required is O(loglogn). 

• FindPrev(x, Gg) finds the predecessor of x with color c G Gg instead of the successor. 
It is supported symmetrically. 

• Report (yi, y2, C*!})- We handle the report query as described by j39|. This algorithm 
reports each element y G L such that yi < y < 2/2 once for each color in G{y) H Gg. 
Since the number of colors in G{y) is constant, we can easily modify this procedure to 
only report each element once. Let Ti be the tree for the block containing yi and let 
T2 be the tree for the block containing 1/2 • We begin by searching bottom up in Ti and 



18 



T2 to report elements to the right of yi in Ti and to the left of 1/2 in T2 with colors in 
Cq. We can report these elements in 0(loglogn + k) time using the allleaf s array, 
leafs arrays, and the leaf lists. If Ti 7^ T2, then we also perform a report query in G 
to find all the blocks between Ti and T2 with colors m c G Cq. For each such block b 
and color c with root rb, we can use leaf s[c] and the leaf list for c to report all such 
leaves in constant time per element. 



Mark(x, c), Unmark(x, c). This is analogous changing the color of an element x in 39 . 
X is stored as a leaf in some tree Ti,. We spend O(loglogn) time updating the nodes 
on the path from x to the root ri,- If this adds or removes a color from C(rft), we make 
the corresponding insertion or deletion from G in O(loglogn) time. We also need to 
update the leaf list for color c in Tf,. Since we can find the successor and predecessor 
of X with color c in O(loglogn) time, we can make this update in O(loglogn) time. 

Add(x). Find the block b where x belongs in O(loglogn) time, and insert it into Tf, 
in O(loglogn) time. Remove an index i from b.freelist and point b . allleaf s [i] 
to X in constant time. Note that insertion may require a block to become too large. 
However, we can split b into two blocks and amortize the cost of splitting as described 
in [31] or |39|, so that the amortized cost of an insertion is still O (log log n). 



• Remove(x). We do lazy deletion. Instead of removing an element from Tb we mark 
it as deleted. This requires periodic rebuilding of the entire structure, but does not 
increase the amortized asymptotic time of the operations. 

Now we consider the space usage of our structure. Recall that the space required by a GVEB 
structure G with parameters N,M is 0{N\C\) where G = [log* D]. We can periodically 
rebuild G so that N = 9{n/ log^ n) and G = 0(log4 n). Since we set /3 > 3, the total space 
usage is 0(n). The data structure also requires a single lookup table of size 0{M). In our 
application, M = 0{n) as well. □ 
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C One-Dimensional Retroactive Queries 



In this section we build on the GUSF data structure of Section [B| and show that we can 
support one-dimensional fully-retroactive range reporting and nearest neighbor queries in 
logarithmic time. Note that in one dimension a nearest neighbor query reduces to a successor 
and predecessor query. 

Let i? C M be a set of points we maintain over time. Each point with value y, insertion 
time a and deletion time b is represented by a horizontal line segment from {a,y) to {b,y), 
which we store as the triple {a,b,y). We construct a modified segment tree as a weight 
balanced B-tree T with branching parameter log^ n for a small constant 5. The leaves of 
the tree represent the endpoints of a segment. A segment s = {a, b, y) is stored at the leaves 
for its endpoints a and b. In addition, we associate s with all the nodes on the paths from 
a and b to the root of T. 

For each internal node v, we maintain a set of segments of associated with t> in a GUSF 
with parameter = 0(n) denoted by M{v). Each segment s = {a,b,y) is associated with 
a color c{s, v) £ C that represents the children of v that are ancestors of a or b. We observe 
that if V has A children, then we require O(A^) colors. Therefore we set 6 = (1/4) logs n, 
which limits the number of children so that the number of colors required is at most |C| = 
log* n. We will use the color of segments to guide both queries and updates. We have 
the following possible colors. For each distinct pair u, w of children of u, we have a color 
c{u,w). For each child n of u we also have colors c{u,u) ce{u) and Cr{u). We assign the 
color c(s, v) of segment s = (a, b, y) as follows. If either a or 6 are descendants of u, then 
there must be a child of Va or Vb respectively which contain these endpoints. If both a 
and b are descendants of v, we let c(s, v) = c{va, Vh). Otherwise if only the left endpoint a is 
a descendant of v, we let c{s,v) = ce{va)- Similarly if only right endpoint 6 is a descendant 
of V we let c(s, v) = Cr{vh). Thus, if we consider each internal node to represent the range of 
values of its leaves, these colors indicate whether s is contained in the range of v or whether 
the segment is cut on the left or right by the range. If s either does not intersect the range 
of V or spans the range of v then neither of its endpoints is a descendant of v and so it is 
not stored in M{v). 

We maintain the following three tables of |31J to facilitate efficient updates and queries 
on M{v). 

• U{v) is used to update M{v). It maps each pair of children u,w to a color c{u, w). It 
also maps each child u to the colors c(n, u), Ci{u) and Cr{u). 

• Q{v) is used to query M{v). It maps each child u to a set of colors Q{v)[u], all the 
colors that correspond to segments that span the range of u. Thus for each sibling r 
to the right of u and each sibling i to the left of u, Q{v)[u\ contains the colors c(£, r), 
ce{£), and Cr(r). 

• F{v) replaces the fractional cascading structure used by [31] in their previous imple- 
mentations. It maps each child u oiv to the set of colors F{v)[u] = {ci{u)}U{cr{u)}U 
{c{u,u)} U {c{u,w)\u / is a child of v}, which corresponds to the set of segments 
for which at least one endpoint is a descendant of v. 
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Let r be the root of our segment tree T. Note that M(r) contains all segments in T. We 
augment r with a balanced search tree T^. We augment all internal nodes with the tables 
described above. Given T, with the above augmentations, Giora and Kaplan 31 proved 
the space required is only 0(n log n/ log log n), and described how to perform insertions, 
deletions, in O(logn) amortized time and fully-retroactive successor queries in O(logn) 
time. We now show how to perform fully-retroactive (one dimensional) range reporting 
queries in 0(logn + k) time. 

Given a query report(t, y, z) we can report all the points present at time t with values 
between y and z as follows. Let r = vq, vi,V2, ■ ■ ■ vt be the search path for t in T. First we 
search for the successor of y, yo and the predecessor of z, zq in T^. This gives us pointers to 
the smallest and largest elements inside the range, (see Figure [2]). Then for each i in < i < 
k we report all the segments in M {vi)\M (vi+i) by calling M{vi).report{y, z,Q{vi)[vi+i]), 
and we find yi+i and Zj+i with two queries: FindNext(?/i, F {vi)[vi-\-i]) and FindPrev(zj, F{vi)[vi-^-i]). 
Note that all segments of M(f j+i) are contained in M{vi) and their colors correspond exactly 
to the set F{vi)[vi-\-i]. Hence yi+i and Zj+i are the smallest and largest values respectively 
of any of the remaining unreported segments that could possibly intersect the range. There- 
fore all segments within the range [y,z\ in Af(f j)\M(wj+i) will be reported for each and 
correctness follows. To bound the time required, we observe that we will search twice in 
Tj. and perform one report and two find queries for each node on the search path to t. 
Therefore we spend O(logn) time in the root, and O (log log n + ki) time for each of the 
nodes visited in T. Since the number of nodes on the search path is limited by the height 
of T to O (log n/ log log n), the total time is 0(logn + k). 
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D Space Filling Curves 



Space Filling Curves (SFCs) were introduced by Peano |43J as a way to map the unit interval 
onto the unit square. As a result, two-dimensional SFCs are sometimes referred to as Peano 
curves. SFCs are commonly used in computer science to map multidimensional points down 
to one dimension. They have the desirable property that points that are close to each other 
along the curve are often also close to each other in the original multidimensional space. By 
now space filling curves are well-studied and multiple authors have applied them specifically 



to quadtrees (e.g., see 12,46]) and approximate nearest neighbor queries 14,16,21,37 
The two most well known SFCs are the Hilbert Curve introduced by Hilbert 36 , and the 
z-curve, proposed by Morton ^40j. We call the sorted order of points with respect to their 
position along the Hilbert or z-curve the Hilbert order or z-order respectively. Some authors 
(e.g., see have suggested that the Hilbert order achieves better clustering in practice. 
However, it is more complex to compare the relative positions of two points in the Hilbert 
order. The z-order has the advantage that comparison of two points only requires bitwise 
exclusive-or. Thus in this paper we consider the z-order. 

The z-order is sometimes called the shuffle order, because a point's position along the 
curve can be found by interleaving the bits of its coordinates. The metaphor is that in 
two dimensions we shuffle together the bits of the x and y coordinates as if they were two 
halves of a deck of cards. Let P be the set of d-dimensional input points. Assume that each 
coordinate of each input point p G P can be represented in binary by a w-hit word. Let pi 
denote the i-th coordinate of p and let pij denote the j-th bit of pi. Thus pi is represented 
as pi^u, ■ ■ -pi^i in binary. Define the shuffle a{p) to be the number pi^^, • • -pd^w ■ ■ -Pi,! ■ ■ -Pd,! 
in binary Let Pz = {a{p)\p G P}, the z-order of P. For any two points p,q G P, Chan [Ts] 
has shown that we can determine their order in Pz in 0(d) time as follows: 

i = 1 

for j = 2 to d 

if \pi ®qi\< \pj e qj\ 

i = j 
return pi < qi 

where © denotes bitwise exclusive-or and | • | denotes [log2 x\ . However, we can compute 
|x| < \y\ without using the | • | operator, by replacing \x\ < \y\ with this line of code. 

a X > y return false else return x < x (B y 

Thus we can compare p and q using only the ® operation. 

Although points that are close to each in along an SFC are close to each other in the 
original multidimensional space, the converse it not always true. Points can be close to each 
other in M*^ but not close to each other along the curve. Thus in general, points that are 
close to each other on the curve may not be nearest neighbors in M'^. In the rest of this 
section, we will show that by keeping d+1 shifted versions of the z-order, we can guarantee 
that for any point q, a c-approximate nearest neighbor can be found among the successors 
and predecessors of q in one of these z-orders. Here c is a constant that depends on d. 



Recall a lemma of Bern et al. 12 , which shows an important relationship between 
quadtree cells and the z-order. 
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Lemma D.l. The set of points in any quadtree cell rooted at [0,1)'^ form a contiguous 
interval in the z- order. 



Proof. For ease of exposition, we give the proof for d = 2, but it extends naturally to higher 
dimensions. Let p S [0, 1)^. We observe that the first two bits of (t{p) determine which 
quadrant of the quadtree contains p, as shown in Figure[3} Call this quadrant Qi. Similarly, 
the next two bits of p determine which quadrant of Qi contains p. Inductively, if p is in a 
quadtree cell Qi determined by the first 2i bits of p, then the next two bits of p determine 
which quadrant of Qi contains p. Thus, there is a bijection between a path in the quadtree 
from the root to a cell on level i and the first 2i bits of p. All points within the same 
quadtree cell on level i must have the first 2i bits in common. Suppose Qi is a quadtree 
cell, and that / is the set of points in the z-order contained in Qi. As we have shown, these 
points must share the first 2i bits. Now suppose we have a point q ^ Qi. Then one of the 
first 2i bits of a{q) must differ from the first 2i bits of I. Therefore, cr(q) must either come 
before I or after I in the z-order. It cannot occur between two elements of I. Otherwise 
cr{q) would not have had a bit that differs from the first 2i bits in /, which would imply 
that q £ Qi. □ 

We say that two points p, q belong to the same r-grid cell iff p div r = q div r. We say 
that a point p is a-central in its r-grid cell iff for each integer i G we have ar < pi 

mod r < (1 — a)r, or equivalently {pi + ar) mod r > 2ar. The following lemma is due to 
Chan PI. 

Lemma D.2. Suppose d is even. Letv^^^ = {j/{d+l),...,j/{d+l)) eR'^. For any point 
p G M'^ andr = 2~^{i G N), there exists j G {0, 1, . . . d} such thatp+v^ is {l/(2d+2)) -central 
in its r-grid cell. 



The next lemma was proven by 37 for the Hilbert curve, and |38| for the z-curve. 



Lemma D.3. Let Ce{q) denote the closed hypercube with edges of length 2e. centered on q. 
Given a point q G [0, 1)'^ and e, where < e < l/{2d -|- 2), there exists a j,0 < j < d such 
that Ce{q + v^) is contained in an r-region with r satisfying r/{4d + 4) < e < r/{2d -\- 2). 



Given the previous lemmas, Liao et al. [37] showed the following theorem. We leverage 
this result to achieve fully-retroactive (1 -|- e)-approximate nearest neighbor queries. 

Theorem D.4. Let P be a set of points in M"*. Define a constant c = ^/d{Ad -\- A) -\- 1. 
Suppose that we have d-\-l lists P-\-v^ ,j = 0, . . . ,d, each one sorted according to its z-order. 
We can find a query point q 's c- approximate nearest neighbor in P by examining the the 
2(d-\- 1) predecessors and successors of q in the lists. 

Proof. Let 5 be the distance between a query point q and its true nearest neighbor p € P. 
By Lemma D.l , the r-region Qj that contains both q and p contains all points in the z-order 
between q and p . Thus Qj must contain the predecessor or the successor of q. Furthermore, 



by Lemma D.2, there exists an r-region that contains the hypercube Cs{q -\- v^*)) such that 
r/{4d -\- 4) < 6 < r/{2d -\- 2), for some < i < d. Therefore one of the 2(d -\- 1) successors 
and predecessors is at most distance r < ^/d{4d -\- 4)5 to q. □ 
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Figure 3: Suppose p G [0, 1)^. The next two bits in the shuffle of p correspond to which 
quadrant of the cell contains p. 




Q...Q..Q...Q..Q..Q...Q..Q..Q Q . . Q . . Q . . . Q . . Q . . Q Q 



Figure 4: The z-order curve corresponds to the order the leaves would be visited in an 
in-order traversal of the quadtree. We can store the points in a linked list in this order. 
Any element between two elements i,j in the linked list must fall in the same quadtree cell 
as i and j. 



24 



